PatchedArray: basics and wiring by a10y · Pull Request #7002 · vortex-data/vortex

a10y · 2026-03-17T21:48:03Z

Summary

This is the first PR in a series addressing the PatchedArray RFC: vortex-data/rfcs#27

This PR adds a new PatchedArray array variant, which is slated to only be used with FastLanes array types. The design is largely documented in the RFC, but briefly

Wraps another child array
Holds a buffer of offsets and patch values. They are both sorted by chunk/lane so there is efficient random as well as bulk access for CUDA

We are able to pushdown the following at reduce time:

Filters pushes through to the child, and we trim the patch indices to the range of chunks covered by the filter
Slicing that reduces the chunk range
Compare always pushes through to the child
Take pushes through to the inner then constructs a hashmap of the patches

There will be follow ups to add the wiring into CUDA and to update how BitPacked and ALP arrays are written.

Testing

There are unit tests for all of the reducers and kernels

codspeed-hq · 2026-03-17T21:51:42Z

Merging this PR will degrade performance by 24.45%

❌ 9 regressed benchmarks
✅ 1097 untouched benchmarks
⏩ 1522 skipped benchmarks¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`old_alp_prim_test_between[f32, 32768]`	398.1 µs	444.5 µs	-10.44%
❌	Simulation	`new_bp_prim_test_between[i32, 16384]`	101.5 µs	116.2 µs	-12.67%
❌	Simulation	`take_10k_contiguous`	270.9 µs	329 µs	-17.65%
❌	Simulation	`new_bp_prim_test_between[i32, 32768]`	149.1 µs	178.3 µs	-16.38%
❌	Simulation	`new_bp_prim_test_between[i64, 16384]`	118.3 µs	147.9 µs	-20.03%
❌	Simulation	`take_10k_random`	209.1 µs	267.3 µs	-21.78%
❌	Simulation	`new_bp_prim_test_between[i64, 32768]`	182.4 µs	241.4 µs	-24.45%
❌	Simulation	`new_alp_prim_test_between[f64, 16384]`	131.2 µs	153.3 µs	-14.45%
❌	Simulation	`old_bp_prim_test_between[i64, 32768]`	473.4 µs	535.5 µs	-11.6%

_{Comparing aduffy/patched-array (5412a08) with develop (3c5fbd3)}

1522 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

vortex-array/src/arrays/patched/array.rs

vortex-array/src/arrays/patched/compute/filter.rs

vortex-array/src/arrays/patched/array.rs

a10y · 2026-03-27T13:54:30Z

@claude review please

a10y · 2026-03-27T14:28:51Z

@claude i have addressed your main comments. please give another review.

a10y · 2026-03-27T15:14:10Z

@claude please review

a10y · 2026-03-27T15:50:06Z

@claude PR is ready for another review

claude · 2026-03-27T15:50:38Z

Claude finished @a10y's task in 8m 28s —— View job

PR Review: PatchedArray basics and wiring

Tasks:

Read PR context and understand the changes
Read core array implementation files
Read compute kernel implementations
Read vtable implementation
Read test/compat fixtures
Provide comprehensive review feedback

Overall this is solid work — the design is clean, the test coverage is good, and the key operations (slice, filter, take, compare, scalar_at) are all well-structured. I found three correctness bugs of varying severity. Details below.

🔴 Bug 1: `build()` computes `n_chunks` incorrectly for sliced arrays

File: vortex-array/src/arrays/patched/vtable/mod.rs, line 246

let n_chunks = len.div_ceil(1024);  // WRONG for offset > 0

When a PatchedArray is sliced at a non-chunk boundary, array.offset > 0. For example, a slice from element 5 to element 1025 produces len = 1020, offset = 5, but the array spans 2 chunks. The lane_offsets buffer has n_chunks_actual * n_lanes + 1 = 2 * n_lanes + 1 entries, but build() reconstructs with n_chunks = 1020.div_ceil(1024) = 1, causing all downstream chunk/lane indexing to be wrong after a round-trip through serialization.

Fix: let n_chunks = (len + metadata.offset as usize).div_ceil(1024);

This matches how slice_chunks counts chunks and how SliceReduce selects chunk boundaries. All the invariants hold because n_chunks is always derivable from (len + offset).

Fix this →

🔴 Bug 2: Missing upper-bounds check in `compare.rs`'s `ApplyPatches::apply`

File: vortex-array/src/arrays/patched/compute/compare.rs, lines 139–149

let bit_index = chunk * 1024 + patch_index as usize;
if bit_index < self.offset {
    continue;  // lower-bound guard ✓
}
let bit_index = bit_index - self.offset;
// No upper-bound check — can write past end of bits buffer!
if cmp(patch_value, self.constant) {
    self.bits.set(bit_index)
} else {
    self.bits.unset(bit_index)
}

The bits buffer has lhs.len bits. A patch in the final chunk at a position beyond offset + len (which can easily happen when len is not a multiple of 1024 — e.g., 2048-element array with a patch at index 2047 sliced to [0, 2040)) causes bit_index = 2047 - 0 = 2047 >= 2040, an out-of-bounds write.

Fix: Add if bit_index >= self.bits.len() { continue; } (or use self.offset + lhs_len as an explicit upper bound) after computing bit_index - self.offset. The equivalent guard already exists in apply_patches_primitive (line 369 of vtable/mod.rs).

Fix this →

🔴 Bug 3: Double-offset in `slice_chunks` when `chunks.start == 0` and `offset > 0`

File: vortex-array/src/arrays/patched/array.rs, lines 138–168

let begin = (chunks.start * 1024).max(self.offset);  // = self.offset when start=0
let inner = self.inner.slice(begin..end)?;            // inner[0] = original[self.offset]
let offset = begin % 1024;                            // = self.offset again

When chunks.start == 0 and self.offset > 0 (i.e. filtering a non-chunk-boundary-sliced array that doesn't prune front chunks), the inner gets sliced by self.offset bytes AND the new offset field is still self.offset. During execute(), buffer.slice_typed(offset..offset+len) then skips another self.offset elements — effectively double-counting.

This path is reachable via filter.rs: when chunk_start == 0 && chunk_stop < n_chunks, slice_chunks(0..chunk_stop) is called. The existing test_filter_sliced test avoids this because it slices at chunk boundary 1024 (offset=0).

Fix: Distinguish the two cases:

When chunks.start > 0: begin = chunks.start * 1024, new offset = 0 (chunk-aligned start)
When chunks.start == 0: begin = 0, new offset = self.offset, new len = min(chunks.end * 1024, self.offset + self.len) - self.offset

Fix this →

🟡 Memory: `slice_chunks` retains all patch data

File: array.rs, lines 145–146

let indices = self.indices.clone();
let values = self.values.clone();

After slice_chunks, the indices and values arrays still hold patches for the entire original array, not just the retained chunk range. The lane_offsets correctly limits access, so this is not a correctness issue, but after a selective filter on a large array, all patches are still in memory. Worth a // TODO: trim indices and values to the patch range of lane_offsets[0]..lane_offsets[last] comment at least.

🟡 `take_map` builds the full patch hashmap even for selective takes

File: compute/take.rs, lines 92–107

for chunk in 0..n_chunks {
    for lane in 0..n_lanes {
        // ... inserts all patches into hashmap
    }
}

This scans all patches regardless of which elements are being taken. For a sparse take on a large array with many patches, it might be worth looping over the take indices first to identify which chunks are relevant, then only loading patches for those chunks. Not a blocker, but worth a // TODO for future optimization.

✅ Things that look good

The transpose / scalar_at lane computation is consistent (both use absolute_index % n_lanes, where absolute_index is relative to the pre-offset chunk start, and n_lanes always divides 1024, so chunk boundaries don't break lane alignment).
apply_patches_primitive correctly guards against both lower and upper out-of-bounds patches.
The SliceReduce implementation is elegant — slicing always to chunk boundaries and tracking the sub-chunk offset is exactly right.
The compat fixture is good, covering multi-chunk with patches in different chunks.
Test coverage is thorough for the happy paths: rstest parameterization in slice.rs is especially nice.

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

We call append_to_builder on the inner first, then just do a single pass and overwrite what it just wrote. Signed-off-by: Andrew Duffy <andrew@a10y.dev>

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

joseph-isaacs · 2026-03-30T10:41:22Z

vortex-buffer/src/buffer.rs

+    pub fn reinterpret<V: Sized>(&self) -> &[V] {
+        assert!(
+            self.is_aligned(Alignment::of::<V>()),
+            "ByteBuffer not properly aligned to {}",
+            type_name::<V>()
+        );
+
+        assert_eq!(
+            self.length % size_of::<V>(),
+            0,
+            "ByteBuffer length not a multiple of the value length"
+        );
+
+        let v_len = self.length / size_of::<V>();
+        let v_ptr = self.bytes.as_ptr().cast::<V>();
+
+        // SAFETY: we checked that alignment and length are suitable to treat this as a &[V].
+        unsafe { std::slice::from_raw_parts(v_ptr, v_len) }
+    }
+}


did we never do this before?

I coudln't find anything no

joseph-isaacs · 2026-03-30T10:41:58Z

vortex-test/compat-gen/src/fixtures/arrays/synthetic/encodings/patched.rs

shall we add a few more of these with different types? e.g. string?

Executing anything other than primitive would fail

joseph-isaacs · 2026-03-30T10:44:47Z

vortex-array/src/arrays/patched/array.rs

+    pub(super) len: usize,
+
+    /// lane offsets. The PType of these MUST be u32
+    pub(super) lane_offsets: BufferHandle,


are you sure we don't want to have the option to compress these?

joseph-isaacs · 2026-03-30T10:46:32Z

vortex-array/src/arrays/patched/array.rs

+    /// Number of 1024-element chunks. Pre-computed for convenience.
+    pub(super) n_chunks: usize,


why store this?

this will increase the stack size and maybe register pressure do avoid / 1024?

joseph-isaacs · 2026-03-30T10:47:16Z

vortex-array/src/arrays/patched/array.rs

+    /// Number of lanes the patch indices and values have been split into. Each of the `n_chunks`
+    /// of 1024 values is split into `n_lanes` lanes horizontally, each lane having 1024 / n_lanes
+    /// values that might be patched.
+    pub(super) n_lanes: usize,


This I think we need to store incase we change the default ptype -> lane mapping in the future

I think this kind of change would require new arrays. The data layout would change

vortex-array/src/arrays/patched/vtable/mod.rs

joseph-isaacs · 2026-03-30T10:52:00Z

vortex-array/src/arrays/patched/vtable/mod.rs

+}
+
+#[derive(Clone, prost::Message)]
+pub struct PatchedMetadata {


I wonder if you can pack this into a u128?

hm, is there a reason to do that over struct? can protobuf even encode u128?

I mean using less and smaller fields

joseph-isaacs · 2026-03-30T10:52:03Z

vortex-array/src/arrays/patched/vtable/mod.rs

+    /// in a filter or slice it may be sliced to the nearest chunk boundary.
+    #[prost(uint64, tag = "1")]
+    pub(crate) inner_len: u64,
+


This is used only to fetch

what do you mean by this

I am asking why we store this, and its likely cause the inner may have a different len to the result patch array?

joseph-isaacs · 2026-03-30T10:52:41Z

vortex-array/src/arrays/patched/vtable/mod.rs

+        array.indices = children.remove(0);
+        array.values = children.remove(0);
+
+        Ok(())


on debug check the dtypes?

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y added the changelog/feature A new feature label Mar 18, 2026

a10y marked this pull request as ready for review March 18, 2026 21:21

a10y requested a review from joseph-isaacs March 18, 2026 21:21

joseph-isaacs reviewed Mar 19, 2026

View reviewed changes

vortex-array/src/arrays/patched/array.rs Outdated Show resolved Hide resolved

joseph-isaacs reviewed Mar 19, 2026

View reviewed changes

vortex-array/src/arrays/patched/compute/filter.rs Show resolved Hide resolved

a10y force-pushed the aduffy/patched-array branch from c3d7434 to 6c1d7aa Compare March 19, 2026 13:24

joseph-isaacs reviewed Mar 19, 2026

View reviewed changes

vortex-array/src/arrays/patched/compute/filter.rs Outdated Show resolved Hide resolved

joseph-isaacs reviewed Mar 19, 2026

View reviewed changes

vortex-array/src/arrays/patched/array.rs Outdated Show resolved Hide resolved

a10y force-pushed the aduffy/patched-array branch from 860cc64 to 042394b Compare March 24, 2026 19:17

a10y mentioned this pull request Mar 24, 2026

[wip] PatchedArray wiring: Part 2 #7145

Draft

a10y force-pushed the aduffy/patched-array branch 4 times, most recently from df8dbf5 to 862d814 Compare March 26, 2026 19:54

This comment was marked as resolved.

Sign in to view

a10y force-pushed the aduffy/patched-array branch from daf258f to 578d83d Compare March 27, 2026 14:35

This comment was marked as resolved.

Sign in to view

a10y added 5 commits March 27, 2026 15:08

PatchedArray: basics and wiring

0f0f11a

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

take

3b4b7ec

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

add unit tests

7e58979

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

final

c676d71

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

actually make the kernel get used

5a0a479

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y added 18 commits March 27, 2026 15:08

fix tests

44eb41d

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

use child for values instead of buffer

e26e065

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

cast in scalar_at

3d8335a

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

implement append_to_builder for Patched

24f5207

We call append_to_builder on the inner first, then just do a single pass and overwrite what it just wrote. Signed-off-by: Andrew Duffy <andrew@a10y.dev>

cleanup

362b2e9

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

indices as child instead of buffer

2431e17

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

lockfiles

281d229

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

address comments

66b0afe

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

fixes

b73eef2

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

address

047fca1

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

add test for sliced take

795a0a9

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

add test for sliced append_to_builder

3d17313

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

add test

8c624c5

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

use fixture

385f1ab

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

rename + doc comment

7fd3372

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

add Patched as default codec

7970598

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

save

9f5ca5d

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

fixup

a9ce27f

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y force-pushed the aduffy/patched-array branch from 8648244 to a9ce27f Compare March 27, 2026 19:33

joseph-isaacs reviewed Mar 30, 2026

View reviewed changes

a10y added 4 commits March 30, 2026 09:10

address comments about vtable

e874e3f

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

use child

33907bf

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

fix

29a76c5

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

more

5412a08

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

		/// Number of 1024-element chunks. Pre-computed for convenience.
		pub(super) n_chunks: usize,

Conversation

a10y commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

codspeed-hq bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 24.45%

Performance Changes

Footnotes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

a10y commented Mar 27, 2026

Uh oh!

This comment was marked as resolved.

a10y commented Mar 27, 2026

Uh oh!

This comment was marked as resolved.

a10y commented Mar 27, 2026

Uh oh!

This comment was marked as resolved.

a10y commented Mar 27, 2026

Uh oh!

claude bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: PatchedArray basics and wiring

🔴 Bug 1: build() computes n_chunks incorrectly for sliced arrays

🔴 Bug 2: Missing upper-bounds check in compare.rs's ApplyPatches::apply

🔴 Bug 3: Double-offset in slice_chunks when chunks.start == 0 and offset > 0

🟡 Memory: slice_chunks retains all patch data

🟡 take_map builds the full patch hashmap even for selective takes

✅ Things that look good

Uh oh!

This comment was marked as resolved.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

a10y commented Mar 17, 2026 •

edited

Loading

codspeed-hq bot commented Mar 17, 2026 •

edited

Loading

claude bot commented Mar 27, 2026 •

edited

Loading

🔴 Bug 1: `build()` computes `n_chunks` incorrectly for sliced arrays

🔴 Bug 2: Missing upper-bounds check in `compare.rs`'s `ApplyPatches::apply`

🔴 Bug 3: Double-offset in `slice_chunks` when `chunks.start == 0` and `offset > 0`

🟡 Memory: `slice_chunks` retains all patch data

🟡 `take_map` builds the full patch hashmap even for selective takes